{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "IN_COLAB = 'google.colab' in sys.modules\n",
    "if IN_COLAB:\n",
    "    !git clone https://github.com/cs357/demos-cs357.git\n",
    "    !mv demos-cs357/figures/ .\n",
    "    !mv demos-cs357/additional_files/ ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import numpy.linalg as la\n",
    "\n",
    "import scipy.optimize as sopt\n",
    "from scipy.optimize import minimize\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "from mpl_toolkits.mplot3d import axes3d\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Numerical Optimization "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1) Example of a constrained optimization problem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose we want to maximize the area of a rectangle with sides `d1` and `d2`. \n",
    "\n",
    "We store these design variables in the array `x = np.array([d1,d2])`.\n",
    "\n",
    "We add to this problem a constraint that the perimeter should be less than a given quantity. The functions below evaluate the area and perimeter for input variable `x`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def A(x):\n",
    "    d1 = x[0]\n",
    "    d2 = x[1]\n",
    "    return d1*d2\n",
    "\n",
    "\n",
    "def P(x):\n",
    "    d1 = x[0]\n",
    "    d2 = x[1]\n",
    "    return 2*(d1+d2)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can visualize how the area and perimeter change with `x` using 3d plots:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure()\n",
    "ax = fig.gca(projection=\"3d\")\n",
    "\n",
    "xmesh, ymesh = np.mgrid[0:10:100j,0:10:100j]\n",
    "AMesh = A(np.array([xmesh, ymesh]))\n",
    "ax.plot_surface(xmesh, ymesh, AMesh, cmap=plt.cm.winter, rstride=3, cstride=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig2 = plt.figure()\n",
    "ax = fig2.gca(projection=\"3d\")\n",
    "\n",
    "PMesh = P(np.array([xmesh, ymesh]))\n",
    "ax.plot_surface(xmesh, ymesh, PMesh,  cmap=plt.cm.autumn, rstride=3, cstride=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.axis(\"equal\")\n",
    "plt.grid()\n",
    "figc1 = plt.contour(xmesh, ymesh, AMesh, 20, cmap=plt.cm.winter)\n",
    "figc2 = plt.contour(xmesh, ymesh, PMesh, 10, colors='k')\n",
    "plt.clabel(figc2, inline=1, fontsize=12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's say we want to solve:\n",
    "\n",
    "```html\n",
    "max A\n",
    "st  P <= 20\n",
    "```\n",
    "\n",
    "or to use with minimize function\n",
    "\n",
    "```html\n",
    "min f = -A\n",
    "st g = 20 - P >= 0\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Initial guess\n",
    "x0 = np.array([9,1])\n",
    "f = lambda x: -A(x) \n",
    "g = lambda x: 20 - P(x)\n",
    "minimize(f, x0, constraints=({'type': 'ineq', 'fun': lambda x: g(x)}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2) 1D Optimization Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this activity, we will find the optimizer of functions in 1d using two iterative methods. For each one, you should be thinking about cost and convergence rate.\n",
    "\n",
    "The iterative methods below can be applied to more complex equations, but here we will use a simple polynomial function of the form:\n",
    "\n",
    "$$f(x) =  a x^4 + b x^3 + c x^2 + d x + e $$\n",
    "\n",
    "The code snippet below provides the values for the constants, and functions to evaluate $f(x)$, $f'(x)$ and $f''(x)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = 17.09\n",
    "b = 9.79\n",
    "c = 0.6317\n",
    "d = 0.9324\n",
    "e = 0.4565\n",
    "\n",
    "def f(x):\n",
    "    return a*x**4 + b*x**3 + c*x**2 + d*x + e\n",
    "\n",
    "def df(x):\n",
    "    return 4*a*x**3 + 3*b*x**2 + 2*c*x + d\n",
    "\n",
    "def d2f(x):\n",
    "    return 3*4*a*x**2 + 2*3*b*x + 2*c\n",
    "\n",
    "xmesh = np.linspace(-1, 0.5, 100)\n",
    "plt.ylim([-1, 3])\n",
    "plt.plot(xmesh, f(xmesh))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### a) Golden Section Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tau = (np.sqrt(5)-1)/2\n",
    "\n",
    "a0 = -0.9 #-2\n",
    "b0 = -0.2 #1\n",
    "\n",
    "h_k = b0 - a0  \n",
    "\n",
    "x1 = a0 + (1-tau) * h_k\n",
    "x2 = a0 + tau * h_k\n",
    "print(x1,x2)\n",
    "f1 = f(x1)\n",
    "f2 = f(x2)\n",
    "\n",
    "errors = [np.abs(h_k)]\n",
    "count = 0\n",
    "\n",
    "while (count < 30 and np.abs(h_k) > 1e-6):\n",
    " \n",
    "    if  f1>f2:\n",
    "        # complete code\n",
    "    else:\n",
    "        # complete code\n",
    "        \n",
    "    errors.append(np.abs(h_k))  \n",
    "    \n",
    "    print(\"%10g \\t %10g \\t %12g %12g\" % (a0, b0, errors[-1], errors[-1]/errors[-2]))\n",
    "    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### b) Newton's method in 1D"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.ylim([-1, 3])\n",
    "plt.plot(xmesh, f(xmesh))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's fix an initial guess:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_exact = -0.4549"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = 0.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dfx = df(x)\n",
    "d2fx = d2f(x)\n",
    "\n",
    "# carry out the Newton step\n",
    "xnew = x - dfx / d2fx\n",
    "\n",
    "# plot approximate function\n",
    "plt.plot(xmesh, f(xmesh))\n",
    "plt.plot(xmesh, f(x) + dfx*(xmesh-x) + d2fx*(xmesh-x)**2/2)\n",
    "plt.plot(x, f(x), \"o\", color=\"red\")\n",
    "plt.plot(xnew, f(xnew), \"o\", color=\"green\")\n",
    "plt.ylim([-1, 3])\n",
    "\n",
    "# update\n",
    "x = xnew\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Implement Newton method 1D\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### c) Using scipy library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import scipy.optimize as sopt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sopt.minimize?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x0 = 2\n",
    "sopt.minimize(f, x0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sopt.golden(f,brack=(-8,2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3) ND Optimization Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We provide three example of functions. You will be able to observe difference convergence carachteristics among them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Function 1:\n",
    "$$ f(x,y) = 0.5 x^2 + 2.5 y^2 $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f1(x):\n",
    "    return 0.5*x[0]**2 + 2.5*x[1]**2\n",
    "\n",
    "def df1(x):\n",
    "    return np.array([x[0], 5*x[1]])\n",
    "\n",
    "def ddf1(x):\n",
    "    return np.array([\n",
    "                     [1,0],\n",
    "                     [0,5]\n",
    "                     ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Function 2:\n",
    "$$ f(x,y) = (x-1)^2 + (y-1)^2 $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f2(x):\n",
    "    return (x[0]-1)**2 + (x[1]-1)**2\n",
    "\n",
    "def df2(x):\n",
    "    return np.array([2*(x[0]-1),2*(x[1]-1) ])\n",
    "\n",
    "def ddf2(x):\n",
    "    return np.array([\n",
    "                     [2,0],\n",
    "                     [0,2]\n",
    "                     ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Function 3:\n",
    "$$ f(x,y) = 100 (y-x^2)^2 + (1-x)^2 $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f3(X):\n",
    "    x = X[0]\n",
    "    y = X[1]\n",
    "    val = 100.0 * (y - x**2)**2 + (1.0 - x)**2\n",
    "    return val\n",
    "\n",
    "def df3(X):\n",
    "    x = X[0]\n",
    "    y = X[1]\n",
    "    val1 = -400.0 * (y - x**2) * x - 2 * (1 - x)\n",
    "    val2 = 200.0 * (y - x**2)\n",
    "    return np.array([val1, val2])\n",
    "\n",
    "def ddf3(X):\n",
    "    x = X[0]\n",
    "    y = X[1]\n",
    "    val11 = -400.0 * (y - x**2) + 800.0 * x**2 + 2\n",
    "    val12 = -400.0 * x\n",
    "    val21 = -400.0 * x\n",
    "    val22 = 200.0\n",
    "    return np.array([[val11, val12], [val21, val22]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Helper functions for plotting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plotFunction(f, interval=(-2,2), levels=20, steps=None, fhist=None):\n",
    "    \n",
    "    a,b = interval\n",
    "    \n",
    "    xmesh, ymesh = np.mgrid[a:b:100j,a:b:100j]\n",
    "    fmesh = f(np.array([xmesh, ymesh]))\n",
    "    \n",
    "    \n",
    "    fig = plt.figure(figsize=(16,4))\n",
    "\n",
    "    ax = fig.add_subplot(131,projection=\"3d\")\n",
    "    ax.plot_surface(xmesh, ymesh, fmesh,cmap=plt.cm.coolwarm);\n",
    "    plt.title('3d plot of f(x,y)')\n",
    "\n",
    "    ax = fig.add_subplot(132)\n",
    "    ax.set_aspect('equal')\n",
    "    c = ax.contour(xmesh, ymesh, fmesh, levels=levels)\n",
    "\n",
    "    plt.title('2d countours of f(x,y)')\n",
    "    ax.clabel(c, inline=1, fontsize=10)\n",
    "    \n",
    "    if steps is not None:  \n",
    "        plt.plot(steps.T[0], steps.T[1], \"o-\", lw=3, ms=10)\n",
    "     \n",
    "    if fhist is not None:\n",
    "        ax = fig.add_subplot(133)\n",
    "        plt.semilogy(fhist, '-o')\n",
    "        plt.xlabel('iteration')\n",
    "        plt.ylabel('f')\n",
    "        plt.grid()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plotFunction(f1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plotFunction(f2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plotFunction(f3,levels=np.logspace(0,4,10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plotConvergence( steps, exact , r ):\n",
    "       \n",
    "    error = la.norm(np.array(steps) - np.array(exact),axis=1)\n",
    "    ratio = []\n",
    "    for k in range(len(error)-1):\n",
    "        ratio.append( error[k+1]/error[k]**r )\n",
    "\n",
    "    fig = plt.figure(figsize=(4,4))\n",
    "\n",
    "    plt.plot(ratio, \"o-\", lw=3, ms=10)\n",
    "    plt.ylim(0,2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A) Steepest Descent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def steepestDescent(f,df,x0,maxiter,tol,alpha = 0):\n",
    "\n",
    "    \n",
    "    steps = [x0]   \n",
    "    x = x0\n",
    "    fhist = [f(x)]\n",
    "    \n",
    "    # Steepest descent with line search\n",
    "    for i in range(maxiter):\n",
    "\n",
    "\n",
    "        # complete the code\n",
    "\n",
    "\n",
    "        \n",
    "    print('optimal solution is:', x)\n",
    "        \n",
    "    return steps, fhist, i   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial guess\n",
    "x0 = np.array([2, 2./5])\n",
    "# Steepest descent\n",
    "steps, fhist, iterations = steepestDescent(f1,df1,x0,50,1e-6)\n",
    "print('converged in', iterations, 'iterations')\n",
    "# Plot convergence   \n",
    "plotFunction(f1,steps=np.array(steps),fhist=np.array(fhist))\n",
    "\n",
    "plotConvergence( steps, [0,0] , 1 )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial guess\n",
    "x0 = np.array([-1.5, -1])\n",
    "# Steepest descent\n",
    "steps, fhist, iterations = steepestDescent(f2,df2,x0,50,1e-4)\n",
    "print('converged in', iterations, 'iterations')\n",
    "# Plot convergence   \n",
    "plotFunction(f2,steps=np.array(steps),fhist=np.array(fhist))\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial guess\n",
    "x0 = np.array([0, 1.75])\n",
    "# Steepest descent\n",
    "steps, fhist, iterations = steepestDescent(f3,df3,x0,1000,1e-6)\n",
    "print('converged in', iterations, 'iterations')\n",
    "# Plot convergence   \n",
    "plotFunction(f3,steps=np.array(steps),levels=np.logspace(0,4,8), fhist=np.array(fhist))\n",
    "\n",
    "plotConvergence( steps, [1 , 1] , 1 )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial guess\n",
    "x0 = np.array([-0.5, -1])\n",
    "# Steepest descent\n",
    "steps, fhist, iterations = steepestDescent(f3,df3,x0,1000,1e-6)\n",
    "print('converged in', iterations, 'iterations')\n",
    "# Plot convergence   \n",
    "plotFunction(f3,steps=np.array(steps),levels=np.logspace(0,4,8), fhist=np.array(fhist))\n",
    "\n",
    "plotConvergence( steps, [1 , 1] , 1 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### B) Newton's method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def NewtonMethod(f,df,ddf,x0,maxiter,tol):\n",
    "    \n",
    "    steps = [x0]   \n",
    "    x = x0\n",
    "    fhist = [f(x)]\n",
    "    \n",
    "    # Steepest descent with line search\n",
    "    for i in range(maxiter):\n",
    "\n",
    "        # complete the code\n",
    "        \n",
    "    print('optimal solution is:', x)\n",
    "        \n",
    "    return steps, fhist, i "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial guess\n",
    "x0 = np.array([2, 2./5])\n",
    "# Newton's method\n",
    "steps, fhist, iterations = NewtonMethod(f1,df1,ddf1,x0,50,1e-6)\n",
    "print('converged in', iterations, 'iterations')\n",
    "# Plot convergence   \n",
    "plotFunction(f1,steps=np.array(steps),fhist=np.array(fhist))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial guess\n",
    "x0 = np.array([-1, -1.0])\n",
    "# Newton's method\n",
    "steps, fhist, iterations = NewtonMethod(f2,df2,ddf2,x0,50,1e-6)\n",
    "print('converged in', iterations, 'iterations')\n",
    "# Plot convergence   \n",
    "plotFunction(f2,steps=np.array(steps),fhist=np.array(fhist))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial guess\n",
    "x0 = np.array([-0.5, -1])\n",
    "# Newton's method\n",
    "steps, fhist, iterations = NewtonMethod(f3,df3,ddf3,x0,50,1e-8)\n",
    "print('converged in', iterations, 'iterations')\n",
    "# Plot convergence   \n",
    "plotFunction(f3,steps=np.array(steps),levels=np.logspace(0,4,8), fhist=np.array(fhist))\n",
    "\n",
    "plotConvergence( steps, [1 , 1] , 2 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4) Location of Cities\n",
    "http://www.benfrederickson.com/numerical-optimization/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, given data on the distance between different cities, we want map out the cities by finding their locations in a 2-dimensional coordinate system.\n",
    "\n",
    "Below is an example of distance data that we may have available that will allow us to map a list of cities."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Table of Distances Between Various Cities](figures/city_table.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load the data that we will be working with in this example.\n",
    "`city_data` is an $n \\times n$ numpy array where $n$ is the number of cities. `city_data` will store the table of distances between cities similar to the one above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "city_data = np.load('additional_files/city_data.npy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we start working with the data, we need to normalize the data by dividing by the largest element. This will allow us to more easily generate an initial guess for the location of each city."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "city_data = city_data/np.max(city_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below is a list of cities that we want to locate on a map."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "city_names = [\n",
    "    \"Vancouver\", \n",
    "    \"Portland\", \n",
    "    \"New York\", \n",
    "    \"Miami\", \n",
    "    \"Mexico City\", \n",
    "    \"Los Angeles\", \n",
    "    \"Toronto\", \n",
    "    \"Panama City\", \n",
    "    \"Winnipeg\", \n",
    "    \"Montreal\", \n",
    "    \"San Francisco\", \n",
    "    \"Calgary\", \n",
    "    \"Chicago\", \n",
    "    \"Halifax\", \n",
    "    \"New Orleans\", \n",
    "    \"Saskatoon\", \n",
    "    \"Guatemala City\", \n",
    "    \"Santa Fe\", \n",
    "    \"Austin\", \n",
    "    \"Edmonton\", \n",
    "    \"Washington\", \n",
    "    \"Phoenix\", \n",
    "    \"Atlanta\", \n",
    "    \"Seattle\", \n",
    "    \"Denver\"\n",
    "  ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similar to the first example, the first step is to define the function that we need to minimize."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One way to formulate this problem is using the following loss function:\n",
    "\n",
    "$$\n",
    "loss({\\bf X}) = \\sum_i \\sum_j (({\\bf X}_i - {\\bf X}_j)^T({\\bf X}_i - {\\bf X}_j) - D_{ij}^2)^2\n",
    "$$\n",
    "\n",
    "- ${\\bf X}_i$ and ${\\bf X}_j$ are the positions for cities $i$ and $j$. Each position ${\\bf X}$ has two components, the $x$ and $y$ coordinates. ** These are the variables we want to find! **\n",
    "\n",
    "- $({\\bf X}_i - {\\bf X}_j)^T({\\bf X}_i - {\\bf X}_j)$ is the squared-distance between cities $i$ and $j$, given the positions ${\\bf X}_i$ and ${\\bf X}_j$.\n",
    "\n",
    "- $D_{ij}$ is the known distance between cities $i$ and $j$. ** These are the given (known) variables provided in `city_data`.**\n",
    "\n",
    "The loss function measures how much the actual location and the guess location differ. The optimization problem becomes:\n",
    "\n",
    "$$ \\min_{{\\bf X}} loss({\\bf X}) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assume that the location of cities is stored as `city_loc`, a 1D numpy array of size $2n$, such that the x-coordinate of a given city is stored first followed by it's y-coordinate.\n",
    "\n",
    "I.e.\n",
    "$$\n",
    "\\text{city_loc} = \n",
    "\\begin{bmatrix}\n",
    "X_1[0]\\\\\n",
    "X_1[1]\\\\\n",
    "\\vdots \\\\\n",
    "X_n[0]\\\\\n",
    "X_n[1]\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "For example, if we had the cities Los Angeles, San Francisco and Chicago with their locations $(0.2,0.1),(0.2,0.5),(0.6,0.7)$, respectively, then `city_loc` would be\n",
    "$$\n",
    "\\text{city_loc} = \n",
    "\\begin{bmatrix}\n",
    "0.2\\\\\n",
    "0.1\\\\\n",
    "0.2 \\\\\n",
    "0.5\\\\\n",
    "0.6\\\\\n",
    "0.7\n",
    "\\end{bmatrix}\n",
    ".\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The objective functin is defined below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def loss(city_loc, city_data):\n",
    "    totalLoss = 0.\n",
    "    n = len(city_loc)//2\n",
    "    \n",
    "    for i in range(n):\n",
    "        for j in range(n):\n",
    "            xij = city_loc[2*i:2*i+2] - city_loc[2*j:2*j+2]\n",
    "            totalLoss += ( np.inner(xij,xij) - city_data[i,j]**2)**2\n",
    "    return totalLoss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the function that we want to minimize, we need to compute it's gradient to use steepest descent.\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "\\frac{\\partial loss}{\\partial X_k} =& \\sum_i -4 ((X_i-X_k)^T(X_i-X_k) - D_{ik}^2) (X_i-X_k)\n",
    "\\\\+& \\sum_j 4 ((X_k-X_j)^T(X_k-X_j) D_{kj}^2) (X_k-X_j)\n",
    "\\end{align}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def gradientLoss(city_loc, city_data):\n",
    "    n = len(city_loc)\n",
    "    grad = np.zeros(n)\n",
    "    \n",
    "    for k in range(n//2):\n",
    "        for i in range(n//2):\n",
    "            xik = city_loc[2*i:2*i+2] - city_loc[2*k:2*k+2]\n",
    "            grad[2*k:2*k+2] += -4 * (np.dot(xik,xik) - city_data[i,k]**2) * xik\n",
    "\n",
    "        for j in range(n//2):\n",
    "            xkj = city_loc[2*k:2*k+2] - city_loc[2*j:2*j+2]\n",
    "            grad[2*k:2*k+2] += 4 * (np.dot(xkj,xkj) - city_data[k,j]**2) * xkj\n",
    "    return grad"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We should now have everything that we need to use steepest descent if we use a learning rate instead of a line search parameter.\n",
    "\n",
    "#### Steepest descent using learning rate\n",
    "Write a function to run steepest descent for this problem. \n",
    "\n",
    "- Store the solution after each iteration of steepest descent in a list called `city_loc_history`. To make plotting easier, we will reshape `city_loc` when we store it so that it is of shape $n \\times 2$ instead of $2n$.\n",
    "\n",
    "- Store the loss function after each iteration in a list called `loss_history`.\n",
    "\n",
    "- Your algorithm should not exceed a given maximum number of iterations. \n",
    "\n",
    "- In addition, you should add a convergence stopping criteria. Here assume that the change in the loss function from one iteration to the other should be smaller than a given tolerance `tol`.\n",
    "\n",
    "```python\n",
    "def steepest_descent(city_loc, learning_rate, city_data, num_iterations, tol):\n",
    "    # compute num_iterations of steepest_descent\n",
    "    city_loc_history = [city_loc.reshape(-1,2)]\n",
    "    loss_history = []\n",
    "    \n",
    "    for i in range(num_iterations):\n",
    "    \n",
    "        # write step of steepest descent here\n",
    "        \n",
    "        loss_history.append( ... )\n",
    "       \n",
    "        city_loc_history.append(city_loc.reshape(-1,2))\n",
    "        \n",
    "        if (check if tolerance is reached):\n",
    "            break\n",
    "        \n",
    "    return city_loc_history, loss_history\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def steepest_descent(city_loc, learning_rate, city_data, num_iterations, tol):\n",
    "    city_history = [city_loc.reshape(-1,2)]\n",
    "    loss_history = []\n",
    "    for i in range(num_iterations):\n",
    "        loss_history.append( loss(city_loc, city_data) )\n",
    "        city_loc = city_loc - learning_rate * gradientLoss(city_loc, city_data)\n",
    "        city_history.append(city_loc.reshape(-1,2))\n",
    "        if (i > 0):\n",
    "            error = abs(loss_history[-2] - loss_history[-1])\n",
    "            if ( error < tol ) :\n",
    "                break\n",
    "    return city_history, loss_history"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using your `steepest_descent` function, find the location of each city.\n",
    "Use a random initial guess for the location of each of the cities and use the following parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "learning_rate = 0.005\n",
    "num_iterations = 300\n",
    "tol = 1e-8\n",
    "city_loc = np.random.rand(2*city_data.shape[0])\n",
    "\n",
    "city_loc_history, loss_history = steepest_descent(city_loc, learning_rate, city_data, num_iterations, tol)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the `np.dstack` function to change the list of city locations into a numpy array.\n",
    "We will have a 3D numpy array with dimensions $n \\times 2 \\times num\\_iterations$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "city_loc_history = np.dstack(city_loc_history)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's display the location of each of the cities.\n",
    "Note, you can use `plt.text` to display the name of the city on the plot next to its location instead of using a legend. The following code snippet will plot the final location of the cities if we assume that we stored the result of steepest descent as `city_loc_history`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_cities = city_loc_history.shape[0]\n",
    "\n",
    "for i in range(num_cities):\n",
    "    plt.scatter(city_loc_history[i,0,-1], city_loc_history[i,1,-1])\n",
    "    plt.text(city_loc_history[i,0,-1], city_loc_history[i,1,-1], city_names[i])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Does your plot make sense? Keep in mind that we aren't keeping track of orientation (we don't have a fixed point for the origin) so you may need to \"rotate\" or \"invert\" your plot (mentally) for it to make sense.  Or if you want an extra challenge for after class, you can modify the optimization problem to include a fixed origin :-)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can plot the history of the loss function, and observe the convergence of the optimization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(loss_history, label = 'loss')\n",
    "plt.xlabel('iteration')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also include how location of each of the cities changes throughout the optimization iterations. The following code snippet assumes that the output for steepest descent is stored in `city_loc_history` and is a numpy array of shape $n \\times 2 \\times num\\_iterations$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_cities = city_loc_history.shape[0]\n",
    "\n",
    "for i in range(num_cities):\n",
    "    plt.plot(city_loc_history[i,0,:], city_loc_history[i,1,:])\n",
    "    plt.text(city_loc_history[i,0,-1], city_loc_history[i,1,-1], city_names[i])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The plot looks too cluttered, right?\n",
    "Repeat the same experiment but use a smaller number of cities. But don't forget to normalize the smaller data set. Use 10 cities for this smaller experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "city_data2 = city_data[:10,:10]/np.max(city_data[:10,:10])\n",
    "city_loc2 = np.random.rand(2*city_data2.shape[0])\n",
    "\n",
    "learning_rate = 0.01\n",
    "num_iterations = 300\n",
    "tol = 1e-8\n",
    "\n",
    "city_loc_hist2, loss_hist_2 = steepest_descent(city_loc2, learning_rate, city_data2, num_iterations,tol)\n",
    "city_loc_hist2 = np.dstack(city_loc_hist2)\n",
    "\n",
    "num_cities = city_loc_hist2.shape[0]\n",
    "for i in range(num_cities):\n",
    "    plt.plot(city_loc_hist2[i,0,:], city_loc_hist2[i,1,:])\n",
    "    plt.text(city_loc_hist2[i,0,-1], city_loc_hist2[i,1,-1], city_names[i])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see how the loss changes after each iteration. Plot the `loss_history` variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(loss_hist_2, label = 'loss')\n",
    "plt.xlabel('iteration')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Repeat the same experiment but try different values for the learning rate.\n",
    "What happens when we decrease the learning rate? Do we need more iterations?\n",
    "What happens when we increase the learning rate? Do we need more or less iterations?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Steepest descent with golden-section search\n",
    "\n",
    "Now, let's use golden-section search again for computing the line search parameter instead of a learning rate.\n",
    "\n",
    "Wrap the function that we are minimzing so that $\\alpha$ is a parameter. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def objective_1d(alpha, city_loc, city_data):\n",
    "    return loss(city_loc - alpha * gradientLoss(city_loc,city_data), city_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Rewrite your steepest descent function so that it uses `scipy.optimize.golden`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sd_line(city_loc, city_data, num_iterations, tol):\n",
    "    city_history = [city_loc.reshape(-1,2)]\n",
    "    loss_history = []\n",
    "    for i in range(num_iterations):\n",
    "        \n",
    "        alph = sopt.golden(objective_1d, args=(city_loc,city_data))\n",
    "        \n",
    "        loss_history.append( loss(city_loc, city_data) )\n",
    "        \n",
    "        city_loc = city_loc - alph * gradientLoss(city_loc, city_data)\n",
    "        \n",
    "        city_history.append(city_loc.reshape(-1,2))\n",
    "        \n",
    "        if (i > 0):\n",
    "            error = abs(loss_history[-2] - loss_history[-1])\n",
    "            if ( error < tol ) :\n",
    "                break\n",
    "                \n",
    "    return city_history, loss_history\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use your new function for steepest descent with line search to find the location of the cities and plot the result. You can use the code snippets for plotting that we provided above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_iterations = 300\n",
    "tol = 1e-8\n",
    "\n",
    "city_hist3, loss_hist3 = sd_line(city_loc2, city_data2, num_iterations, tol)\n",
    "city_hist3 = np.dstack(city_hist3)\n",
    "\n",
    "num_cities = city_hist3.shape[0]\n",
    "\n",
    "for i in range(num_cities):\n",
    "    plt.plot(city_hist3[i,0,:], city_hist3[i,1,:])\n",
    "    plt.text(city_hist3[i,0,-1], city_hist3[i,1,-1], city_names[i])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What do you notice about how the solution evolves? Do you expect using a line search method that the solution will converge faster or slower? Will using a line search method increase the cost per iteration of steepest descent?\n",
    "\n",
    "Plot the loss function at each iteration to see if it converges in fewer number of iterations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.semilogy(loss_hist3, label = 'loss')\n",
    "plt.xlabel('iteration')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using scipy function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now compare the functions you defined above with the `minimize` function from `scipy`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import scipy.optimize as sopt\n",
    "\n",
    "num_cities = 6\n",
    "tol = 1e-5\n",
    "\n",
    "city_data_small = city_data[:num_cities,:num_cities]/np.max(city_data[:num_cities,:num_cities])\n",
    "x0 = np.random.rand(2*city_data_small.shape[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Providing only the function to the optimization algorithm \n",
    "# Note that it takes a lot more function evaluations for the optimization, since\n",
    "# gradient and Hessians are approximated in the backend\n",
    "res1 = sopt.minimize(loss, x0, args=(city_data_small), tol=tol )\n",
    "xopt1 = res1.x.reshape(num_cities,2)\n",
    "print(xopt1)\n",
    "print('Optimized loss value is ', res1.fun)\n",
    "print('converged in', res1.nit, 'iteration with', res1.nfev, 'function evaluations' )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Providing function and gradient to the optimization algorithm\n",
    "# Note that the number of function evaluations are reduced, since only Hessian are now approximated\n",
    "res2 = sopt.minimize(loss, x0, args=(city_data_small) , jac=gradientLoss, tol = tol )\n",
    "xopt2 = res2.x.reshape(num_cities,2)\n",
    "print(xopt2)\n",
    "print('Optimized loss value is ', res2.fun)\n",
    "print('converged in', res2.nit, 'iteration with', res2.nfev, 'function evaluations' )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xhist,loss_hist = sd_line(x0, city_data_small, 400, tol)\n",
    "xopt3 = xhist[-1]\n",
    "print(xopt3)\n",
    "print('Optimized loss value is ', loss_hist[-1])\n",
    "print('convergend in ', len(loss_hist),'iterations')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xhist,loss_hist = steepest_descent(x0, 0.01,  city_data_small, 400, tol)\n",
    "xopt3 = xhist[-1]\n",
    "print(xopt3)\n",
    "print('Optimized loss value is ', loss_hist[-1])\n",
    "print('convergend in ', len(loss_hist),'iterations')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
